If the U.S. Senate Was a Hedge Fund, How Would It Do?

Answer: Not very well...


By: Shane Copsey, MeiJade Hsu, and Athen Pham
CMSC320 - Introduction to Data Science, University of Maryland, Fall 2021, John Dickerson



For this analysis, our team answers the question:

How well do U.S. senators trade? Do they consistently outperform the S&P 500 index?

Using publicly-available data from 2012-2021, we will construct a few portfolios that simulate U.S. senators' stock trades (buying and selling of stocks) over this 9.5-year period and measure their performance against the S&P 500 index.

The S&P 500 index is the best-known indicator of the U.S. stock market's performance, measuring the value—and thus the performance—of the 500 largest (by market value) publicly-traded U.S. companies. This indicator is so important in trading and investing that many simply refer to it as "the market." The 500 constantly-changing companies in the S&P 500 are very well-known companies that we frequently encounter in our day-to-day lives:

SP-500-logos.jpeg

Many investors measure professional money managers' performance against the S&P 500. The thinking goes that an investor has two mutually-exclusive choices:

  1. Invest their money with a professional money manager, or
  2. Buy an exchange-traded fund (ETF) that tracks the performance of the S&P 500 (the market).

If the professional money manager doesn't outperform the S&P 500, then the investor may as well buy the S&P 500 ETF since they'd make more money that way, often at a lower cost and risk (usually with other diverse ETFs). Professional money managers charge fees and can make riskier investments like private equity. Therefore, to make it worthwhile for investors to accept a higher cost and risk by investing their money with them, professional money managers should perform better on a percentage basis than the market (usually expressed through the S&P 500).

Here's a chart of the S&P 500's value at the end of each trading day from 1950-2016. One way to easily interpret this chart: If you spent a handful of dollars to buy a single share of an S&P 500 ETF in 1950, you could sell that single share for over $2,000 in 2016:

sp.png

Studies have consistently shown that most professional money managers underperform the S&P 500—over a 15-year period, only 5% of professional money managers managed to beat it. Therefore, if we find that senators have outperformed the S&P 500 over the past 9.5 years, then our finding would be significant since it would show that senators—many of whom are lawyers and businessmen, not money managers—are better at investing than 95% of professional money managers. This would be a rare and impressive achievement indeed.

The question would then be: How can senators be above-average at investing?

Insider trading and the STOCK Act

Although insider trading—trading a company's stock based on nonpublic ("insider") information about the company—has been illegal and punishable by fines and/or imprisonment for most private citizens in the U.S. since 1934, it was not made illegal for members of Congress until 2012. Congress passed the Stop Trading on Congressional Knowledge (STOCK) Act which was then signed into law by President Obama on April 4, 2012, to ban insider trading among members of Congress.

Specifically, the STOCK Act prohibits members of Congress and other government officials from using, for personal benefit, insider information that they acquired through their government positions. This means that if a member of Congress acquires information that shows a positive future for a company, then they cannot buy that company's stock until the information they acquired becomes public. Also, if a member of Congress acquires information that shows a negative future for a company, then they cannot sell that company's stock until the information they acquired becomes public. The Act also requires all members of Congress to publicly disclose any financial transactions (which is mostly buying and selling) of stocks, bonds, commodities, futures, options, cryptocurrencies, and other securities within 30-45 days of the transaction.

In the nearly ten years since the STOCK Act became law, however, stock trading by members of Congress has repeatedly caused controversy and has raised ethical concerns. In 2020, a major insider trading scandal occurred when six senators sold stocks after attending a confidential briefing on the coronavirus outbreak in January 2020, potentially avoiding losses when the stock market crashed in March 2020. In 2021, Business Insider identified 127 current members of Congress who had either personally or whose staffers had violated provisions of the STOCK Act:

Business Insider STOCK Act Investigation.png

Before the STOCK Act made insider trading by members of Congress illegal, members of Congress had performed exceptionally well compared to the S&P 500. A 2004 study in the Journal of Financial and Quantitative Analysis found that senators outperformed the S&P 500 by an astonishing average of 12 percentage points per year (or 1 percentage point per month) during a 6-year period in the bull market of the 1990s. This means that if the S&P 500's value increased by 10% over a year, then the value of a portfolio consisting of all trades made by senators during that year would increase by 22%.

Senators didn't just beat an already high-performing market: They also beat corporate insiders (people who were trading their own company's stock)—who outperformed the market by an average of 6 percentage points per year—and U.S. households—who underperformed the market by an average of 1.4 percentage points per year. Senators also appeared to know exactly when to buy and sell stocks: They would buy stocks just before the stocks would suddenly outperform the market by more than 25%, and they would sell stocks that had been beating the market by about 25% for the past year just before those stocks would fall back in line with the market's performance. The study's researchers suggested that senators knew when to buy and sell stocks because they had access to information based on their government positions that other investors didn't have—in other words, they were insider trading.

Therefore, another question that our analysis will help answer is:

Is the STOCK Act preventing U.S. senators from benefiting from insider trading?

If we find that senators continued to outperform the S&P 500 from 2012-2021 (i.e., after the STOCK Act was signed into law), then we may have evidence that the STOCK Act didn't achieve its goal of preventing senators from benefiting from insider trading. On the flip side, if we find that senators didn't outperform the S&P 500 from 2012-2021, then we may have evidence that the STOCK Act did prevent senators from benefiting from insider trading.

To answer this question, we'll treat the U.S. Senate as one cohesive unit. We'll download data on all of the stock trades that they've disclosed since the STOCK Act was signed in 2012 and use Python to construct portfolios that simulate their trades (buys and sells) over this 9.5-year period. This means that we'll buy about (exact transaction data was not revealed, but ranges of amounts bought and sold were) the same amount of the same stock that they bought on the same day in the past and sell about the same amount of the same stock that they sold on the same day in the past. Then, we'll convert the portfolio into an index that we'll compare with the S&P 500 index to see whether senators performed better than the S&P 500 on a percentage basis in the 9.5 years since the STOCK Act was signed into law.

Importing and cleaning the data

We downloaded four datasets. For each dataset, we downloaded data from July 25, 2021 (when the first trade was disclosed under the newly-enacted STOCK Act) to December 8, 2021 (when we were writing the code for this project):

  1. A CSV of all transactions made by senators from SenateStockWatcher.com, a free website maintained by members of the public who compile transactions disclosed by senators into CSV and JSON files. On the API page, under the "Bulk Downloads" section, click on the "Download" link in the "All Transactions" row whose file type is a CSV.
  2. A CSV of the daily closing prices and other data of all stocks traded by senators from Compustat, a research-grade database well-known among academics and professionals in the financial world. We received access to Compustat through an educational license. Members of the public can query APIs maintained by third-party websites including TradingView and Polygon, although they may have to pay for a subscription.
  3. A CSV of the daily closing values of the S&P 500 index from the Nasdaq website maintained by Nasdaq, which owns and operates the Nasdaq Stock Market. The Nasdaq Stock Market is the 2nd-largest stock market in the world (behind the New York Stock Exchange) and where many well-known tech companies like Google, Apple, Facebook, Amazon, and Microsoft trade.
  4. A CSV of the daily closing values of the Nasdaq Composite from the Nasdaq website. The Nasdaq Composite measures the value, and thus the performance, of almost all stocks that trade on the Nasdaq stock exchange. We compare the Senate's performance against the Nasdaq since other 3rd-party analyses suggest that members of Congress buy and sell companies in the information technology sector the most frequently.

Importing Libraries:

In [1]:
!pip install pandas_market_calendars
from google.colab import drive
import re
import numpy as np
import pandas as pd
import math
import inspect
from enum import Enum, unique
import matplotlib.pyplot as plt
import pandas_market_calendars as mcal
import warnings
warnings.filterwarnings("ignore") # (This code was added at the end to get rid of some warnings below.)
Collecting pandas_market_calendars
  Downloading pandas_market_calendars-3.2-py3-none-any.whl (87 kB)
     |████████████████████████████████| 87 kB 2.1 MB/s 
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from pandas_market_calendars) (2.8.2)
Requirement already satisfied: pandas>=0.18 in /usr/local/lib/python3.7/dist-packages (from pandas_market_calendars) (1.1.5)
Collecting exchange-calendars>=3.3
  Downloading exchange_calendars-3.5.1.tar.gz (147 kB)
     |████████████████████████████████| 147 kB 38.0 MB/s 
Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from pandas_market_calendars) (2018.9)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from exchange-calendars>=3.3->pandas_market_calendars) (1.19.5)
Collecting pyluach
  Downloading pyluach-1.3.0-py3-none-any.whl (17 kB)
Requirement already satisfied: toolz in /usr/local/lib/python3.7/dist-packages (from exchange-calendars>=3.3->pandas_market_calendars) (0.11.2)
Requirement already satisfied: korean_lunar_calendar in /usr/local/lib/python3.7/dist-packages (from exchange-calendars>=3.3->pandas_market_calendars) (0.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil->pandas_market_calendars) (1.15.0)
Building wheels for collected packages: exchange-calendars
  Building wheel for exchange-calendars (setup.py) ... done
  Created wheel for exchange-calendars: filename=exchange_calendars-3.5.1-py3-none-any.whl size=179603 sha256=e571ff2e255005fba1e59ff7aacea1b1a294f5876a7d0d804975eb415857e8f1
  Stored in directory: /root/.cache/pip/wheels/23/3c/5e/daa68c2efa75e621711c5b0a4eb50e0c338f98cfc00a883fb3
Successfully built exchange-calendars
Installing collected packages: pyluach, exchange-calendars, pandas-market-calendars
Successfully installed exchange-calendars-3.5.1 pandas-market-calendars-3.2 pyluach-1.3.0

Dataset #1: All transactions made by U.S. senators from SenateStockWatcher.com

In [2]:
try:
  drive.mount("/content/drive") # Exception handling here in case the drive is already mounted.
except:
  pass

pd.set_option("display.max_rows", 10)

senate_data = pd.read_csv("/content/drive/My Drive/CMSC320 Final Tutorial/all_transactions_SENATE_12-8-21.csv")
senate_data.head() # Previewing the first five rows of the Senate transaction data.
Mounted at /content/drive
Out[2]:
transaction_date owner ticker asset_description asset_type type amount comment senator ptr_link disclosure_date
0 11/29/2021 Self CAG ConAgra Brands, Inc. Common Stock Stock Purchase 1,00115,000 -- Gary C Peters https://efdsearch.senate.gov/search/view/ptr/6... 12/7/2021
1 11/29/2021 Self OFC Corporate Office Properties Trust Common Stock Stock Purchase 1,00115,000 -- Gary C Peters https://efdsearch.senate.gov/search/view/ptr/6... 12/7/2021
2 11/9/2021 Self -- VZ Stock Purchase 1,00115,000 -- Gary C Peters https://efdsearch.senate.gov/search/view/ptr/6... 12/7/2021
3 11/9/2021 Child -- W.L. Gore &amp; Associates Inc. <div class="te... Non-Public Stock Sale (Partial) 1,00115,000 Gorfam Inc. sold stock to benefit another shar... Christopher A Coons https://efdsearch.senate.gov/search/view/ptr/3... 11/29/2021
4 11/9/2021 Child -- W.L. Gore &amp; Associates Inc. <div class="te... Non-Public Stock Sale (Partial) 1,00115,000 Gorfam Inc. sold stock to benefit another shar... Christopher A Coons https://efdsearch.senate.gov/search/view/ptr/3... 11/29/2021

If you go through all 8,928 rows of the Senate data, you'll see that the data are inconsistently formatted. This may be due to the Senate changing the format over time to report market activity. Specifically:

  1. For some stock trades, the stock's ticker (an abbreviation that the stock trades under on the stock exchange) is located in the asset_description column instead of the ticker column, where it should belong. Therefore, one would have to use a regular expression to find tickers that were improperly listed in the asset_description column, then copy them into the ticker column.
  2. The asset_type column, which specifies the type of security or derivative that was traded (i.e., stock, bond, option, etc.), is blank for transaction dates 2012/07/25 to 2014/12/18, after which senators began filling it out. Therefore, for transaction dates 2012/07/25 to 2014/12/18, one would have to look at the ticker and asset_description columns to determine whether a particular transaction was a stock trade.
  3. From 2012/07/25 to 2014/01/22, senators disclosed all transactions in PDF format. The volunteers who maintain SenateStockWatcher.com did not transcribe these disclosures into CSV format, so we'll, unfortunately, have to exclude all these transactions from our analysis. This means that there will be fewer data points from 2012 to 2014 (we still have some, just not a lot) compared with 2015 through 2021.

Therefore, given the inconsistent formatting of the Senate dataset, we'll have to clean it before we can analyze it.

In [3]:
# Only keep stocks with a valid ticker, amount, transaction date, transaction type, senator, and disclosure date
ticker_csv_pattern = r"^[A-Z0-9-\.]{1,5}$" # Tickers may be longer than five characters, but here it is limited to five to avoid complication
# This pattern is used below for finding tickers in the asset_description column
# The actual pattern should be 1 to 9 characters instead of 1 to 5, and without dashes (some of the data in the spreadsheet had dashes)
senate_data = senate_data[(senate_data.type == 'Purchase') | 
                          (senate_data.type == 'Sale (Full)') | 
                          (senate_data.type == 'Sale (Partial)')] # Keep only these transaction types
# This for loop salvages rows where tickers were put in the asset_description column, but not in the ticker column:
for row, row_str in senate_data[senate_data.ticker == "--"].iterrows():
  asset_description = senate_data.loc[row, 'asset_description'].upper()
  if (re.match(ticker_csv_pattern, asset_description)):
    senate_data.loc[row, 'ticker'] = asset_description
senate_data = senate_data[senate_data.ticker != '--'] # Remove blank tickers
# This for loop salvages rows where the asset_type was left blank, but it was clear that the row was a stock transaction:
for row, row_str in senate_data.iterrows():
  if (str(senate_data.loc[row, 'asset_type']) == "nan"): # If the asset_type is empty, then it will be a float, so converted to a string it is nan
    asset_description = senate_data.loc[row, 'asset_description'].upper()
    # This if statement checks for key words in the description that would tell that the transaction is an options transaction
    # It is a somewhat rough way to check if the senator is dealing with an options contract
    if ("OPTION" not in asset_description and "CALL" not in asset_description and "PUT" not in asset_description):
      senate_data.loc[row, 'asset_type'] = "Stock"
# Using the two for loops above salvaged a lot of data adding about 700 transactions and over 100 tickers to the data.
senate_data = senate_data[senate_data.asset_type == "Stock"] # Only keep stocks from the data (excludes bonds and derivatives)
senate_data = senate_data.drop(columns = ['owner', 'asset_description', 'asset_type', 'comment', 'ptr_link'])
senate_data = senate_data[senate_data.ticker != "SPY160219P00180000"] # One senator listed SPY puts as stocks instead of options
senate_data = senate_data.dropna()

# Given that senators don't report the exact amounts that they traded, we must approximate trade amounts. We chose to take 
# the lower end of the ranges that they reported to avoid overestimating trade amounts and better account for partial (instead of 
# full) stock sales.
def f(row):
    if row['amount'] == "$1,001 - $15,000":
        val = 1001
    elif row['amount'] == "$15,001 - $50,000":
        val = 15001
    elif row['amount'] == "$50,001 - $100,000":
        val = 50001
    elif row['amount'] == "$100,001 - $250,000":
        val = 100001
    elif row['amount'] ==  "$250,001 - $500,000":
        val = 250001
    elif row['amount'] == "$500,001 - $1,000,000":
        val = 500001
    elif row['amount'] == "$1,000,001 - $5,000,000":
        val = 1000001
    elif row['amount'] == "$5,000,001 - $25,000,000":
        val = 5000001
    elif row['amount'] == "$25,000,001 - $50,000,000":
        val = 25000001
    elif row['amount'] == "Over $50,000,000":
        val = 50000001
    return val

senate_data['lower_bound_amount'] = senate_data.apply(f, axis=1) # Convert trade amounts to the lower end of reported ranges.
senate_data.rename(columns = {'type':'transaction_type'}, inplace=True)
senate_data['transaction_date'] = pd.to_datetime(senate_data.transaction_date)
senate_data['disclosure_date'] = pd.to_datetime(senate_data.disclosure_date)
senate_data = senate_data.sort_values(by = 'transaction_date') # Sort trades by chronological order.
print("Number of transactions: " + str(len(senate_data)))
print("Number of tickers: " + str(len(senate_data.ticker.unique())))
senate_data.head(100) # Preview the cleaned senate stock transaction data.
Number of transactions: 6474
Number of tickers: 1017
Out[3]:
transaction_date ticker transaction_type amount senator disclosure_date lower_bound_amount
7706 2012-09-13 DD Purchase 1,00115,000 Thomas R Carper 2015-05-13 1001
7705 2012-12-17 DD Purchase 1,00115,000 Thomas R Carper 2015-05-13 1001
7704 2013-03-15 DD Purchase 1,00115,000 Thomas R Carper 2015-05-13 1001
8484 2013-04-29 CG Purchase 100,001250,000 John Hoeven 2014-05-08 100001
7703 2013-06-13 DD Purchase 1,00115,000 Thomas R Carper 2015-05-13 1001
... ... ... ... ... ... ... ...
8596 2014-03-13 BEAV Purchase 15,00150,000 Sheldon Whitehouse 2014-03-26 15001
8595 2014-03-13 VFC Sale (Partial) 1,00115,000 Sheldon Whitehouse 2014-03-26 1001
8590 2014-03-13 VFC Sale (Partial) 1,00115,000 Sheldon Whitehouse 2014-03-26 1001
8589 2014-03-13 KO Sale (Partial) 1,00115,000 Sheldon Whitehouse 2014-03-26 1001
8608 2014-03-14 AEP Sale (Partial) 1,00115,000 Susan M Collins 2014-03-25 1001

100 rows × 7 columns

Compustat takes a `.txt` file containing the tickers we want data for with one ticker on each line. Therefore, we'll write all of the unique tickers that senators traded to a `.txt` file and reformat some of them to match Compustat's symbol convention to avoid errors.

In [4]:
# Write all unique tickers to a file. Before giving Compustat the list of tickers to parse, we have to ensure that all tickers 
# follow Compustat's symbol convention. Some tickers with non-alphabetic characters (dashes, dots, and numbers) aren't formatted 
# according to Compustat's convention. Thus, we reformatted them manually by cross-checking them against Compustat's 
# convention.
ticker_file = open("/content/drive/My Drive/CMSC320 Final Tutorial/unique_tickers_UNCLEANED.txt", "w")
for ticker in senate_data.ticker.unique():
     ticker_file.write(ticker)
     ticker_file.write("\n")
ticker_file.close()

# changed_tickers is a dictionary that stores, as keys, tickers from senate_data with non-alphabetic characters (dashes, dots, 
# and numbers) and, as values, the tickers reformatted according to Compustat's convention.
changed_tickers = {
    "HUB-B": "HUB.B", 
    "COF-PP": "COF", 
    "XLS-WI": "XLS.WI", 
    "BUNT-RP": "BUNT", 
    "SBUX.SW": "SBUX", 
    "ADT.F": "ADT.F", 
    "RDS-B": "RDS.B", 
    "SYY.SG": "SYY", 
    "BRK-B": "BRK.B", 
    "UA-C": "UA", 
    "NEE-PC": "NEE.PC", 
    "ZNGA.SW": "ZNGA", 
    "RDS-A": "RDS.A", 
    "IBM.MX": "IBM", 
    "MSTY.PA": "MSTY.PA", 
    "LGF-B": "LGF.B", 
    "LM09.SG": "LM09", 
    "RDSA.AS": "RDSA", 
    "0QZI.IL": "0QZI", 
    "3V64.TI": "3V64"
    }

# Reformat tickers in senate_data according to Compustat's convention:
for row, row_str in senate_data.iterrows():
  ticker = senate_data.at[row, 'ticker']
  if ticker in changed_tickers.keys(): # If the ticker is in the list of keys that we have to reformat
    senate_data.at[row, 'ticker'] = changed_tickers[ticker] # Reformat the ticker

# After reformatting tickers, write all unique tickers to a file. We'll tell Compustat to download price and volume data for 
# every unique ticker from 2012 to 2021.
ticker_file = open("/content/drive/My Drive/CMSC320 Final Tutorial/unique_tickers_AUTOMATICALLY_CLEANED.txt", "w")
for ticker in senate_data.ticker.unique():
     ticker_file.write(ticker)
     ticker_file.write("\n")
ticker_file.close()

Dataset #2: Daily closing prices and other data from Compustat

Downloading data from Compustat is a manual process that takes place outside of this tutorial. Thus, after we download our data from Compustat as a CSV, we'll load it into our notebook.

In [5]:
# Take historical compustat data for senate tickers:
compustat = pd.read_csv("/content/drive/My Drive/CMSC320 Final Tutorial/1010_stock_prices.csv")
compustat.rename(columns = {'gvkey': 'primary_key', 
                            'iid': 'issue_id', 
                            'datadate': 'trading_date', 
                            'tic': 'ticker', 
                            'conm': 'company_name', 
                            'cshtrd': 'trading_volume', 
                            'prccd': 'close', 
                            'prchd': 'high', 
                            'prcld': 'low', 
                            'prcod': 'open', 
                            'trfd': 'total_return_factor', 
                            'tpci': 'security_type', 
                            'ajexdi': 'adjustment_factor'}, 
                 inplace = True)
compustat['trading_date'] = pd.to_datetime(compustat.trading_date)
print("Number of tickers from the Compustat data: " + str(len(compustat.ticker.unique())))
compustat.head() # Preview the data grabbed from compustat.
Number of tickers from the Compustat data: 1550
Out[5]:
primary_key issue_id trading_date ticker company_name adjustment_factor trading_volume close high low open total_return_factor secstat security_type costat dlrsn ggroup gind gsector gsubind spcsrc stko dldte ipodate
0 1045 1 2012-07-25 AAMRQ AMERICAN AIRLINES GROUP INC 1.0 276272.0 0.513 0.525 0.510 0.514 2.145309 I 0 A NaN 2030.0 203020.0 20.0 20302010.0 C 0.0 NaN NaN
1 1045 1 2012-07-26 AAMRQ AMERICAN AIRLINES GROUP INC 1.0 1741493.0 0.514 0.515 0.501 0.515 2.145309 I 0 A NaN 2030.0 203020.0 20.0 20302010.0 C 0.0 NaN NaN
2 1045 1 2012-07-27 AAMRQ AMERICAN AIRLINES GROUP INC 1.0 1128873.0 0.504 0.540 0.500 0.519 2.145309 I 0 A NaN 2030.0 203020.0 20.0 20302010.0 C 0.0 NaN NaN
3 1045 1 2012-07-30 AAMRQ AMERICAN AIRLINES GROUP INC 1.0 593948.0 0.510 0.520 0.502 0.502 2.145309 I 0 A NaN 2030.0 203020.0 20.0 20302010.0 C 0.0 NaN NaN
4 1045 1 2012-07-31 AAMRQ AMERICAN AIRLINES GROUP INC 1.0 450541.0 0.507 0.520 0.500 0.510 2.145309 I 0 A NaN 2030.0 203020.0 20.0 20302010.0 C 0.0 NaN NaN

Compustat has 74 unique variables that you can download for each ticker. We downloaded 21 of them, although we didn't end up using all of them in our analysis. The CSV that we downloaded has 2,591,494 rows and 363 MB of data, so it is a large dataset.

Compustat provides the open, high, low, and close prices, the trading volume, and the adjustment factor for each trading day in our analysis period (2012/07/25 to 2021/12/08) for each ticker that senators traded. Also, under the `gsector` column, Compustat lists the Global Industry Classification Standard (GICS®) sector of the stock associated with that row. There are 11 GICS® sectors, which represent the major economic sectors that businesses operate within: Consumer Discretionary, Consumer Staples, Energy, Materials, Industrials, Healthcare, Financials, Information Technology, Real Estate, Communication Services, and Utilities. Here's a chart showing the sector breakdowns for the companies in the S&P 500:

sp_sectors.png

The adjustment factor is a value that we divide each stock's price by to avoid misrepresentation of any stock's relative worth. Why?

Companies can perform a stock split—when they split a single share of stock into more shares of stock—if they want to decrease the price of their stock. For example, if you own one share of Tesla stock and Tesla performs a 2-for-1 stock split when its stock price is 100,thenyounowowntwosharesofTesla(sinceyougettwosharesforeachshareofTeslathatyouown)andTeslasstockpriceisnow50 (since each share was split in half, so its price must also be half the original price). However, the total value of your Tesla stock is still 100(twosharesmultipliedbya50 price per share). Think of stock splits like cutting a cake—after cutting the cake in half, the cake is still the same size, just split in half. There are also reverse stock splits which are far less common than regular stock splits—at least with stocks in the S&P 500 and other large stocks.

We have to adjust stock prices for stock splits so that we accurately calculate gains (profits) and losses on trades over time since senators will buy and sell the same stock at vastly different prices if that stock splits. Adjusting stock prices for stock splits leads to an apples-to-apples comparison between stock prices for the same company at different points in time.

In [6]:
# Adjust share prices to reflect stock splits. The adjustment factor on a date is the amount that we divide share prices on that 
# date by to reflect the cumulative effect of stock splits.
compustat['adj_close'] = compustat['close'] / compustat['adjustment_factor']
compustat['adj_high'] = compustat['high'] / compustat['adjustment_factor']
compustat['adj_low'] = compustat['low'] / compustat['adjustment_factor']
compustat['adj_open'] = compustat['open'] / compustat['adjustment_factor']

Since Compustat won't have every ticker that we want, let's find the tickers that we reformatted that it has as well as the tickers that we reformatted that it doesn't have. We'll have to exclude the tickers that Compustat doesn't have from our analysis, although they shouldn't have a noticeable impact since there are only a few of them.

In [7]:
print("Changed tickers: " + str(changed_tickers) + "\n")
print("Found:")
for key in changed_tickers:
  if not compustat.loc[compustat['ticker'] == changed_tickers[key]].empty or not compustat.loc[compustat['ticker'] == key].empty:
    print(changed_tickers[key])
print("\nNOT found:")
for key in changed_tickers:
  if compustat.loc[compustat['ticker'] == changed_tickers[key]].empty and compustat.loc[compustat['ticker'] == key].empty:
    print(changed_tickers[key])
Changed tickers: {'HUB-B': 'HUB.B', 'COF-PP': 'COF', 'XLS-WI': 'XLS.WI', 'BUNT-RP': 'BUNT', 'SBUX.SW': 'SBUX', 'ADT.F': 'ADT.F', 'RDS-B': 'RDS.B', 'SYY.SG': 'SYY', 'BRK-B': 'BRK.B', 'UA-C': 'UA', 'NEE-PC': 'NEE.PC', 'ZNGA.SW': 'ZNGA', 'RDS-A': 'RDS.A', 'IBM.MX': 'IBM', 'MSTY.PA': 'MSTY.PA', 'LGF-B': 'LGF.B', 'LM09.SG': 'LM09', 'RDSA.AS': 'RDSA', '0QZI.IL': '0QZI', '3V64.TI': '3V64'}

Found:
COF
BUNT
SBUX
RDS.B
SYY
BRK.B
UA
NEE.PC
ZNGA
RDS.A
IBM
LGF.B

NOT found:
HUB.B
XLS.WI
ADT.F
MSTY.PA
LM09
RDSA
0QZI
3V64

Among the tickers that we didn't reformat, Compustat may not have some of those, either. We'll have to exclude these tickers from our analysis, too. Let's identify them just in case they give us errors when we construct our portfolios/indices. We conjecture that Compustat may not have some of these tickers since they're no longer publicly traded for reasons including "going private" (e.g., YHOO (Yahoo), which is now a privately-owned company), going bankrupt, being acquired by another company, etc. A major difficulty that anyone will run into when they try to analyze past stock market data is dealing with companies whose stocks no longer trade publicly. There may also be mutual funds and foreign stocks (outside the U.S.) unavailable in Compustat's dataset.

In [8]:
senate_data['ticker'] = senate_data['ticker'].str.replace("-",".")
senate_data_tickers = senate_data.ticker.unique()
compustat_tickers = compustat.ticker.unique()
compustat_extra_tickers = [ticker for ticker in compustat_tickers if ticker not in senate_data_tickers]
print("Extra tickers from Compustat: " + str(len(compustat_extra_tickers)))
#print("Those tickers listed: " + str(compustat_extra_tickers))
senate_data_tickers_extra_tickers = [ticker for ticker in senate_data_tickers if ticker not in compustat_tickers]
print("Tickers in senate_data not found in Compustat (likely many mutual funds and foreign stocks): " + str(len(senate_data_tickers_extra_tickers)))
#print("Those tickers listed: " + str(senate_data_tickers_extra_tickers))
Extra tickers from Compustat: 630
Tickers in senate_data not found in Compustat (likely many mutual funds and foreign stocks): 89

Constructing Senate portfolios/indices

In [9]:
stocks_historic_close = {} # This is a dictionary, by date, of a dictionary, by ticker, of (of a dictionary of) all stocks' closing prices.
stocks_historic_open = {} # the same, but for opening prices
stocks_historic_high = {} # the same, but for highs
stocks_historic_low = {} # the same, but for lows
stocks_historic_TDRF = {} # the same, but for the total daily return factor
stocks_historic_close = compustat.groupby('trading_date').apply(lambda x: dict(zip(x.ticker, x.adj_close))).to_dict()
stocks_historic_open = compustat.groupby('trading_date').apply(lambda x: dict(zip(x.ticker, x.adj_open))).to_dict()
stocks_historic_high = compustat.groupby('trading_date').apply(lambda x: dict(zip(x.ticker, x.adj_high))).to_dict()
stocks_historic_low = compustat.groupby('trading_date').apply(lambda x: dict(zip(x.ticker, x.adj_low))).to_dict()
stocks_historic_TDRF = compustat.groupby('trading_date').apply(lambda x: dict(zip(x.ticker, x.total_return_factor))).to_dict()
stocks_historic_sector = compustat.groupby('trading_date').apply(lambda x: dict(zip(x.ticker, x.gsector))).to_dict()
# Grouping all of these dictionaries into one big nice dictionary:
stocks_historic_data = {'close': stocks_historic_close,
                        'open': stocks_historic_open,
                        'high': stocks_historic_high,
                        'low': stocks_historic_low,
                        'TDRF': stocks_historic_TDRF,
                        'sector': stocks_historic_sector}

transactions_by_senator = {} # This is a dictionary list, by senator, of dataframes of stock transactions.
senators = senate_data.senator.unique()
for senator in senators:
  transactions_by_senator[senator] = senate_data[senate_data['senator'] == senator]
In [10]:
nyse = mcal.get_calendar('NYSE')
trading_dates = np.sort(compustat['trading_date'].unique())
schedule = nyse.schedule(trading_dates[0], trading_dates[-1])
trading_dates = mcal.date_range(schedule, frequency='1D')
trading_dates = trading_dates.tz_localize(None)
trading_dates = [date.replace(hour=0) for date in trading_dates]
trading_dates = np.array(trading_dates, dtype='datetime64[D]')
# This does not factor in all exchanges and such, but works for this project.

# Only use tickers from the Compustat data (this excludes some of the tickers from the senate data unfortunately).
senate_data_tickers = [ticker for ticker in senate_data_tickers if ticker not in senate_data_tickers_extra_tickers]

for ticker in senate_data_tickers_extra_tickers:
  senate_data = senate_data[senate_data['ticker'] != ticker]

senators = senate_data['senator'].unique()
print("Number of senators trading stocks in the data: " + str(len(senators)))

# Different types of trade executions for generating different types of indices.
@unique
class trade_execution(Enum):
  CLOSE_TO_CLOSE = 0
  WORST_CASE = 1
  BEST_CASE = 2

# worst- and best-case still use the lower-bound amount—which is solely used for all emulated purchases and partial sales for that matter currently—
# when purchasing and partially selling,
# so they are not truly the worst- and best-case of all,
# but rather the worst- and best-case of trading execution

def index_calculator(index_start = 100.0,
                     senators_equally_weighed = False,
                     execution = trade_execution.CLOSE_TO_CLOSE,
                     begin_date = 0,
                     end_date = -1,
                     date_column = "transaction_date",
                     selected_senators = senators):
# index_start is like as if someone put in that amount of money in the index to start with
# begin_date and end_date are indices in the trading_dates list which will select that range
# senators_equally_weight is a boolean variable which tells whether each senator's holdings should be equally weighed or not
#   not being equally weighed is the default, of course, and this treats the whole senate as one big portfolio
# execution may be three values for how purchase and sales are handled:
#   CLOSE_TO_CLOSE is the default where the closing price is used on purchase and sale
#   WORST_CASE is where the trades are purchased at the high of the day and sold at the low of the day
#   BEST_CASE is where the trades are purchased at the low of the day and sold at the high of the day
# date_column is for selecting the dates column in the senate_data dataframe when transactions occur
#   basically, one could leave it the default "transaction_date" and everything happens as if the senate is trading,
#   but if one wanted to see how well they would do mimicking the senate as they see what they do, then they would
#   want to use the disclosure_date (when the trade was disclosed to the public)
#   # MAKE SURE that if disclosure dates are used that they are adjusted to the next trading day or time if they are on a non-trading day
#   # This function may be updated to accommodate for this and should be
#   # The adjustment would be added after checking the arguments as part of the initialization
#   # The date column would be copied and that copy would be used as to avoid overwriting the original object/dataframe
# selected_senators is the list of senators looked at when creating an index

  if ((not isinstance(index_start, float) and not isinstance(index_start, int)) or index_start <= 0
      or not isinstance(begin_date, int) or not isinstance(end_date, int) or (end_date != -1 and begin_date < end_date) or begin_date < -1 or end_date < -1
      or begin_date > len(trading_dates) or end_date > len(trading_dates)
      or (senators_equally_weighed != False and senators_equally_weighed != True)
      or not isinstance(execution, trade_execution)
      or (date_column != "transaction_date" and date_column != "disclosure_date")
      or (not isinstance(selected_senators, np.ndarray) and not isinstance(selected_senators, list)) or selected_senators == []):
    # This if statement detects possible problematic input to the function and raises an exception if so
    raise Exception("Invalid arguments passed to " + str(inspect.currentframe().f_code.co_name))

  selected_trading_dates = trading_dates[begin_date:end_date] # selects trading dates based on the arguments
  days_until_first_trade = 0 # how many days after the first date until the first day with a Senate purchase transaction
  big_gains_losses = pd.DataFrame(columns=['date', 'index_today', 'index_yesterday', 'gain', 'current_holdings']) # debug variable
  level = pd.DataFrame(columns=['date', 'index_today', 'index_yesterday', 'current_holdings']) # debug variable
  holdings_by_day = [] # list of holdings as each day passes
  gain_loss_by_day = [] # list of gain or loss factor for each day
  index_by_day = [] # index list
  current_holdings = {} # current holdings by ticker (in cash)
  shares_by_senator = {} # current shares by senator by ticker

  purchases_yesterday = {} # purchases yesterday by ticker (in cash)
  shares_by_senator_yesterday = {} # shares purchased yesterday by senator by ticker

# first day:
  holdings_by_day.append(0) # initialization
  gain_loss_by_day.append(0.00) # initialization
  index_by_day.append(index_start) # initialization

  # Set initial shares to 0 for all senators
  for senator in selected_senators:
    shares_by_senator[senator] = {}
    shares_by_senator_yesterday[senator] = {}
    for ticker in senate_data_tickers:
      shares_by_senator[senator][ticker] = 0
      shares_by_senator_yesterday[senator][ticker] = 0
  for ticker in senate_data_tickers:
    current_holdings[ticker] = 0
    purchases_yesterday[ticker] = 0

  current_year = selected_trading_dates[days_until_first_trade].astype('M8[Y]')
  # sectors are numbered 10.0 through 60.0 by 5s or nan if no value
  purchases_by_year = {} # total purchases and purchases by sector by calendar year
  purchases_by_year[current_year] = {'total': 0,
                                     'by_sector': {'10.0': 0,
                                                   '15.0': 0,
                                                   '20.0': 0,
                                                   '25.0': 0,
                                                   '30.0': 0,
                                                   '35.0': 0,
                                                   '40.0': 0,
                                                   '45.0': 0,
                                                   '50.0': 0,
                                                   '55.0': 0,
                                                   '60.0': 0,
                                                   'nan': 0}}
  annual_return = {} # annual return factor by calendar year
  annual_return[current_year] = 1.0

  purchase_sell_errors = [] # debug variable
  other_access_errors_by_day = [] # debug variable

  for date in selected_trading_dates:
    today_errors = "|"; # debug variable
    if (current_year != date.astype('M8[Y]')): # this if statement and bit of code is in case the first day starts the next calendar year
      current_year = date.astype('M8[Y]')
      annual_return[current_year] = 1.0
      purchases_by_year[current_year] = {'total': 0,
                                         'by_sector': {'10.0': 0,
                                                       '15.0': 0,
                                                       '20.0': 0,
                                                       '25.0': 0,
                                                       '30.0': 0,
                                                       '35.0': 0,
                                                       '40.0': 0,
                                                       '45.0': 0,
                                                       '50.0': 0,
                                                       '55.0': 0,
                                                       '60.0': 0,
                                                       'nan': 0}}
    current_transactions = senate_data[(senate_data[date_column] == date) & (senate_data['transaction_type'] == "Purchase")]
    num_actual_transactions = 0 # this variable is in case selected_senators does not include all senators and transactions are ignored below due to this
    for row, row_str in current_transactions.iterrows():
      ticker = current_transactions.loc[row, 'ticker']
      senator = current_transactions.loc[row, 'senator']
      if (senator not in selected_senators):
        continue
      amount = current_transactions.loc[row, 'lower_bound_amount']
      date = current_transactions.loc[row, date_column]
      if (stocks_historic_data['close'][pd.Timestamp(date)].get(ticker) == None):
        today_errors += (str(ticker) + " at " + str(date) + "|") # for debug
        continue
      num_actual_transactions += 1
      # shares are based off of the price at that day's close
      shares = amount/(stocks_historic_data['close'][pd.Timestamp(date)][ticker])
      shares_by_senator[senator][ticker] += shares
      shares_by_senator_yesterday[senator][ticker] += shares
      holdings_by_day[0] += amount
      current_holdings[ticker] += amount
      purchases_yesterday[ticker] += amount
      purchases_by_year[current_year]['total'] += amount
      gsector = str(stocks_historic_data['sector'][pd.Timestamp(date)][ticker])
      purchases_by_year[current_year]['by_sector'][gsector] += amount
    if (len(other_access_errors_by_day) > 0 or len(current_transactions) > 0):
      # This if statement adds any errors when scanning through trades to the list of access errors
      other_access_errors_by_day.append(today_errors)
    days_until_first_trade+=1
    if (num_actual_transactions > 0):
      # Once at least one valid transaction occurs, this whole loop breaks
      days_until_first_trade-=1
      break

# second day onward:
  for date_index in range((days_until_first_trade+1), len(selected_trading_dates[(days_until_first_trade+1):])):
    if (current_year != selected_trading_dates[date_index].astype('M8[Y]')): # this if statement is for calendar year changes
      current_year = selected_trading_dates[date_index].astype('M8[Y]')
      annual_return[current_year] = 1.0
      purchases_by_year[current_year] = {'total': 0,
                                         'by_sector': {'10.0': 0,
                                                       '15.0': 0,
                                                       '20.0': 0,
                                                       '25.0': 0,
                                                       '30.0': 0,
                                                       '35.0': 0,
                                                       '40.0': 0,
                                                       '45.0': 0,
                                                       '50.0': 0,
                                                       '55.0': 0,
                                                       '60.0': 0,
                                                       'nan': 0}}
    date = pd.Timestamp(selected_trading_dates[date_index])
    date_yesterday = pd.Timestamp(selected_trading_dates[date_index-1])
    #print(date) # for debug
    #print(index_by_day[date_index-(days_until_first_trade+1)]) # for debug

    # first calculate gains of current stocks
    today_errors = "|"; # debug variable
    prev_holdings = sum(current_holdings.values())

    data_today = stocks_historic_data['close'][date] # closing prices today
    data_yesterday = stocks_historic_data['close'][date_yesterday] # closing prices yesterday
    data_today_high = stocks_historic_data['high'][date] # today's high
    data_yesterday_high = stocks_historic_data['high'][date_yesterday] # yesterday's high
    data_today_low = stocks_historic_data['low'][date] # today's low
    data_yesterday_low = stocks_historic_data['low'][date_yesterday] # yesterday's low

    for ticker in senate_data_tickers:
      if (current_holdings[ticker] > 0):
        price_today = None
        price_yesterday = None
        if ((data_today.get(ticker) != None) and (data_yesterday.get(ticker) != None)):
          amount_bought_before_yesterday = current_holdings[ticker] - purchases_yesterday[ticker]
          if execution == trade_execution.WORST_CASE:
            price_today = data_today_low[ticker]
            price_yesterday = \
              current_holdings[ticker]/(amount_bought_before_yesterday/data_yesterday_low[ticker]+purchases_yesterday[ticker]/data_yesterday_high[ticker])
            # weighted average of worst execution of yesterday's purchases with the purchases before that set to the worst-ending trade execution
          elif execution == trade_execution.BEST_CASE:
            price_today = data_today_high[ticker]
            price_yesterday = \
              current_holdings[ticker]/(amount_bought_before_yesterday/data_yesterday_high[ticker]+purchases_yesterday[ticker]/data_yesterday_low[ticker])
            # weighted average of best execution of yesterday's purchases with the purchases before that set to the best-ending trade execution
          else:
            price_today = data_today[ticker]
            price_yesterday = data_yesterday[ticker]
        else:
          today_errors += (str(ticker) + " at " + str(date) + "|") # for debug
        if ((price_today != None) and (price_yesterday != None) and (not math.isnan(price_today)) and (not math.isnan(price_yesterday))):
          current_holdings[ticker] *= (price_today/price_yesterday) # Multiply by change in holding value from last close to today's close

    senators_gain_loss_today = []
    if (senators_equally_weighed):
      # This is strictly when the index is weighing senators equally:
      for senator in selected_senators:
        senator_prev_holdings = 0
        senator_after_holdings = 0
        senator_shares = sum(shares_by_senator[senator].values())
        if (senator_shares == 0):
          continue
        for ticker in senate_data_tickers:
          if (shares_by_senator[senator][ticker] > 0):
            price_today = None
            price_yesterday = None
            if ((data_today.get(ticker) != None) and (data_yesterday.get(ticker) != None)):
              shares_bought_before_yesterday = shares_by_senator[senator][ticker] - shares_by_senator_yesterday[senator][ticker]
              if execution == trade_execution.WORST_CASE:
                price_today = data_today_low[ticker]
                price_yesterday = \
                  (shares_bought_before_yesterday*data_yesterday_low[ticker]+shares_by_senator_yesterday[senator][ticker]*data_yesterday_high[ticker])\
                  /shares_by_senator[senator][ticker]
                # weighted average of worst execution of yesterday's purchases with the purchases before that set to the worst-ending trade execution
              elif execution == trade_execution.BEST_CASE:
                price_today = data_today_high[ticker]
                price_yesterday = \
                  (shares_bought_before_yesterday*data_yesterday_high[ticker]+shares_by_senator_yesterday[senator][ticker]*data_yesterday_low[ticker])\
                  /shares_by_senator[senator][ticker]
                # weighted average of best execution of yesterday's purchases with the purchases before that set to the best-ending trade execution
              else:
                price_today = data_today[ticker]
                price_yesterday = data_yesterday[ticker]
            if ((price_today != None) and (price_yesterday != None) and (not math.isnan(price_today)) and (not math.isnan(price_yesterday))):
              # This calculates the current senator's value for the previous day and the current day to get a gain or loss later
              senator_prev_holdings = price_yesterday*shares_by_senator[senator][ticker]
              senator_after_holdings = price_today*shares_by_senator[senator][ticker]
        if (senator_prev_holdings != 0 and senator_after_holdings != 0):
          # This adds the gain or loss of the current senator to a list which is averaged later
          senators_gain_loss_today.append(senator_after_holdings/senator_prev_holdings)

    holdings_by_day.append(sum(current_holdings.values()))
    if (not senators_equally_weighed):
      gain_loss_by_day.append(holdings_by_day[-1]/prev_holdings)
    else:
      gain_loss_by_day.append(gain_loss_by_day[-1])
      if (len(senators_gain_loss_today) != 0):
        gain_loss_by_day[-1] = sum(senators_gain_loss_today)/len(senators_gain_loss_today)
    #print(gain_loss_by_day[-1]) # for debug
    index_by_day.append(index_by_day[-1]*gain_loss_by_day[-1]) # After the unrealized profit or loss is calculated it is applied to the last index value
    annual_return[current_year]*=gain_loss_by_day[-1]

    # the two if statements below are for debugging
    if (gain_loss_by_day[-1] >= 1.05 or gain_loss_by_day[-1] <= 0.95): # tracking big gains and losses to find possible bugs later
      big_gains_losses.loc[len(big_gains_losses)] = [date, index_by_day[-1], index_by_day[-2], ((gain_loss_by_day[-1]-1)*100), current_holdings]
    if (gain_loss_by_day[-1] == 1.00): # tracking level days to find possible bugs later
      level.loc[len(level)] = [date, index_by_day[-1], index_by_day[-2], current_holdings]

    for senator in selected_senators:
      shares_by_senator_yesterday[senator] = {}
      for ticker in senate_data_tickers:
        shares_by_senator_yesterday[senator][ticker] = 0
    for ticker in senate_data_tickers:
      purchases_yesterday[ticker] = 0

    # add valid purchases to holdings
    purchase_transactions = senate_data[(senate_data[date_column] == date) & (senate_data['transaction_type'] == "Purchase")]
    for row, row_str in purchase_transactions.iterrows():
      ticker = purchase_transactions.loc[row, 'ticker']
      senator = purchase_transactions.loc[row, 'senator']
      if (senator not in selected_senators):
        continue
      amount = purchase_transactions.loc[row, 'lower_bound_amount']
      if (data_today.get(ticker) != None):
        if execution == trade_execution.WORST_CASE:
          shares = amount/(data_today_high[ticker])
        elif execution == trade_execution.BEST_CASE:
          shares = amount/(data_today_low[ticker])
        else:
          shares = amount/(data_today[ticker])
      else:
        #print("Purchase error: " + str(ticker) + " at " + str(date)) # for debug
        purchase_sell_errors.append("Purchase error: " + str(ticker) + " at " + str(date)) # for debug
        continue
      shares_by_senator[senator][ticker] += shares
      shares_by_senator_yesterday[senator][ticker] += shares
      holdings_by_day[-1] += amount
      current_holdings[ticker] += amount
      purchases_yesterday[ticker] += amount
      purchases_by_year[current_year]['total'] += amount
      gsector = str(stocks_historic_data['sector'][date][ticker])
      purchases_by_year[current_year]['by_sector'][gsector] += amount
    
    # match sells to previous buys, then deduct them if sensible
    # this part is a bit more tricky
    # the sells match previous buys of the same senator by looking at that senator's shares in that stock
    sell_transactions = senate_data[(senate_data[date_column] == date) & (senate_data['transaction_type'] != "Purchase")]
    for row, row_str in sell_transactions.iterrows():
      ticker = sell_transactions.loc[row, 'ticker']
      amount = sell_transactions.loc[row, 'lower_bound_amount']
      senator = sell_transactions.loc[row, 'senator']
      if (senator not in selected_senators):
        continue
      transaction_type = sell_transactions.loc[row, 'transaction_type']
      if (data_today.get(ticker) != None):
        shares = amount/(data_today[ticker])
      else:
        #print("Sell error: " + str(ticker) + " at " + str(date)) # for debug
        purchase_sell_errors.append("Sell error: " + str(ticker) + " at " + str(date)) # for debug
        continue
      if (transaction_type == "Sell (Partial)" and shares_by_senator[senator][ticker] > shares):
        current_holdings[ticker] -= amount
        shares_by_senator[senator][ticker] -= shares
      elif ((transaction_type == "Sell (Full)" or transaction_type == "Sell (Partial)") and shares_by_senator[senator][ticker] > 0):
        # if a partial sale happens to exceed the holdings of a senator, then the senator's position is cleared
        current_holdings[ticker] -= (shares_by_senator[senator][ticker])*(data_today[ticker])
        shares_by_senator[senator][ticker] = 0.00

    #print("Errors: " + today_errors) # for debug
    other_access_errors_by_day.append(today_errors)

  #print("Index list: " + str(index_by_day))
  # return huge object of data
  return {'index_by_day': index_by_day,
          'purchases_by_year': purchases_by_year,
          'annual_return': annual_return,
          'days_until_first_trade': days_until_first_trade,
          'holdings_by_day': holdings_by_day,
          'gain_loss_by_day': gain_loss_by_day,
          'debug': {'purchase_sell_errors': purchase_sell_errors,
                    'other_access_errors_by_day': other_access_errors_by_day,
                    'big_gains_losses': big_gains_losses,
                    'level': level}}
Number of senators trading stocks in the data: 46

As more data was being extracted from the data set and as the calculator above was being improved, sometimes index calculations would shift quite a bit. Improvements in precision and accuracy would sometimes lead to significant differences in outcome, but this index calculator has become much more precise and accurate. It seems that the calculation/function becomes less accurate as time goes on as we have not taken everything into account in the stock market, such as when companies are acquired, merged, or go private.

Visualizing and drawing conclusions from the data

Now, for the moment of truth: Did the Senate outperform the S&P 500 from 2012-2021?

Senate Index vs. S&P 500 (and Nasdaq Composite) Line Graph:

To compare senators' investing ability against the market, we compared our calculated Senate indices with stock market indices, specifically the S&P 500 and Nasdaq Composite. Our Senate index function uses an initial reference index value of $100 and measures the change over time. We used the same overlapping time frame for the stock market indices and our Senate indices, 2012/09/13 to 2021/11/29. To fit our stock market indices to the initial 100 reference index, we took the stock market index on 2012/09/13, and for each consecutive day, we divided each day's value by the initial 2012/09/13 value and multiplied by 100. As a result, we have the Senate index plotted with the stock market indices fitted to have an initial value of 100 to measure their performances proportionally over time.

Using this graph, we can determine the performance of our Senate indices compared with multiple stock market indices and potentially help determine if the STOCK Act was effective in preventing senators from profiting off of insider trading.

In [11]:
default_senate_calculation = index_calculator()
default_senate_calculation_worst_case = index_calculator(execution=trade_execution.WORST_CASE)
default_senate_calculation_best_case = index_calculator(execution=trade_execution.BEST_CASE)
In [12]:
plt.rcParams['figure.figsize'] = (25,10)
In [13]:
# Import S&P 500 and NASDAQ data
sp_data = pd.read_csv("/content/drive/My Drive/CMSC320 Final Tutorial/sp_10_year_prices.csv")
nasdaq_data = pd.read_csv("/content/drive/My Drive/CMSC320 Final Tutorial/nasdaq_10_year_prices.csv")

# Convert Date values to datetime
sp_data['Date'] = pd.to_datetime(sp_data['Date'])
nasdaq_data['Date'] = pd.to_datetime(nasdaq_data['Date'])

# Identify first and last transaction date for the Senate Index
first_date = senate_data.head(1).transaction_date
last_date = senate_data.tail(1).transaction_date

# Sort S&P 500 and NASDAQ data by date
sp_data = sp_data.sort_values(by='Date', ascending=True)
nasdaq_data = nasdaq_data.sort_values(by='Date', ascending=True)

# Drop dates outside the time period for NASDAQ
nasdaq_data = nasdaq_data.drop(nasdaq_data.index[0:193])
nasdaq_data = nasdaq_data.drop(nasdaq_data.tail(14).index)

# Drop dates outside the time period for S&P 500
sp_data = sp_data.drop(sp_data.index[0:195])
sp_data = sp_data.drop(sp_data.tail(0).index)

# Grab inital value of S&P 500 and NASDAQ
initial_value = sp_data.iloc[0]['Close/Last']
initial_value_nasdaq = nasdaq_data.iloc[0]['Close/Last']

# Adjust inital values of S&P 500 and NASDAQ to $100
nasdaq_data['adj_value'] = (nasdaq_data['Close/Last']/initial_value_nasdaq)*100
sp_data['adj_value'] = (sp_data['Close/Last']/initial_value)*100

# Plot Adjusted S&P 500
date = sp_data['Date'].to_list()
value = sp_data['adj_value'].tolist()
plt.plot(date, value, color = "red", label = "S&P 500")

# Plot Adjusted NASDAQ
date = nasdaq_data['Date'].to_list()
value = nasdaq_data['adj_value'].tolist()
plt.plot(date, value, color = "green", label = "NASDAQ")

# Plot Senate Index
x= (trading_dates[default_senate_calculation['days_until_first_trade']:len(default_senate_calculation['index_by_day'])\
    +default_senate_calculation['days_until_first_trade']])
y= default_senate_calculation['index_by_day']
plt.plot(x, y, color="blue", label="Senate Stock Index")
# Best-Case
x= (trading_dates[default_senate_calculation_best_case['days_until_first_trade']:len(default_senate_calculation_best_case['index_by_day'])\
    +default_senate_calculation_best_case['days_until_first_trade']])
y= default_senate_calculation_best_case['index_by_day']
plt.plot(x, y, color="lightblue", label="Senate Stock Index Best-Case")
# Worst-Case
x= (trading_dates[default_senate_calculation_worst_case['days_until_first_trade']:len(default_senate_calculation_worst_case['index_by_day'])\
    +default_senate_calculation_worst_case['days_until_first_trade']])
y= default_senate_calculation_worst_case['index_by_day']
plt.plot(x, y, color="lightblue", label="Senate Stock Index Worst-Case")

plt.xlabel("Year")
plt.ylabel("Value ($)")
plt.title("S&P Index (Adjusted) vs Senate Stock Index")
plt.legend()
plt.show()

In the chart above, the blue line is the index representing the Senate's approximated performance, the light-blue lines being the same index for best- and worst-case trade execution rather than using the closing prices (e.g., best-case would be buying on the low of the day and selling on the high of the day, but the lower bound of the amount of stock traded each transaction is still used), the red line is the S&P 500 index, and the green line is the Nasdaq Composite index. The intuitive way to interpret the chart is: If you'd invested 100intoaportfoliomanagedbytheSenatein2012,yourinvestmentwouldbeworthalittleover200 now in 2021, doubling the value of your investment over 9.5 years. Not bad, right?

Wrong, because if you'd invested the same 100 into a low-cost exchange-traded fund (ETF) tracking the S&P 500 at the same time in 2012, your investment would be worth over300 now in 2021. Logically, you would've chosen to invest in the S&P 500 ETF instead of with the Senate since you would've made more money with less risk by just following the market.

Additionally, if you'd invested the same 100intoalowcostETFtrackingtheNasdaqCompositeatthesametimein2012,yourinvestmentwouldbeworthevenmorenowin2021about500.

Thus, the answer to our question, "How well do U.S. senators trade? Do they consistently outperform the S&P 500 index?", is that senators, when taken as a whole, have traded poorly and underperformed the market in almost calendar every year likely because the STOCK Act shed light on them and made clear that they may not engage in insider trading. After making it illegal for themselves to engage in insider trading, senators are no longer able to achieve the above-average gains that they achieved when they were allowed to engage in insider trading before the STOCK Act.

Below, we quantify our results and examine the annual percentage returns for the Senate and S&P 500 from 2013-2020.

In [14]:
annual_sp = {}
#annual_sp[2012] = (sp_data.iloc[77]['Close/Last']/sp_data.iloc[0]['Close/Last']) # Not a full calendar year
annual_sp[2013] = (sp_data.iloc[338]['Close/Last']/sp_data.iloc[78]['Close/Last'])
annual_sp[2014] = (sp_data.iloc[599]['Close/Last']/sp_data.iloc[339]['Close/Last'])
annual_sp[2015] = (sp_data.iloc[860]['Close/Last']/sp_data.iloc[600]['Close/Last'])
annual_sp[2016] = (sp_data.iloc[1129]['Close/Last']/sp_data.iloc[861]['Close/Last'])
annual_sp[2017] = (sp_data.iloc[1363]['Close/Last']/sp_data.iloc[1130]['Close/Last'])
annual_sp[2018] = (sp_data.iloc[1614]['Close/Last']/sp_data.iloc[1364]['Close/Last'])
annual_sp[2019] = (sp_data.iloc[1866]['Close/Last']/sp_data.iloc[1615]['Close/Last'])
annual_sp[2020] = (sp_data.iloc[2119]['Close/Last']/sp_data.iloc[1867]['Close/Last'])
#annual_sp[2021] = (sp_data.iloc[2348]['Close/Last']/sp_data.iloc[2120]['Close/Last']) # Not a full calendar year

# Create a dataframe storing the annual returns for the Senate and S&P 500 and the differences between them for easy comparison
returns = pd.DataFrame()
years = annual_sp.keys()
returns['Year'] = years
# Add the Senate's annual returns
senate_returns = default_senate_calculation['annual_return']
try:
  senate_returns.pop(np.datetime64("2012"))
  senate_returns.pop(np.datetime64("2021"))
except:
  pass

senate_returns_over_annual_sp_list = [senate_returns[i]/annual_sp[j] for i,j in zip(senate_returns, annual_sp)]
senate_returns_list = list(senate_returns.values())
annual_sp_list = list(annual_sp.values())
returns['Senate Return'] = [(str((factor-1)*100)+"%") for factor in senate_returns_list]
# Add the S&P 500's annual returns
returns['S&P 500 Return'] = [(str((factor-1)*100)+"%") for factor in annual_sp_list]
# Add a column storing the differences between the Senate and S&P 500's annual returns
returns['Senate Under/Outperformance'] = [(str((factor-1)*100)+"%") for factor in senate_returns_over_annual_sp_list]
print(returns)

# Calculate the average annual return for the Senate and S&P 500 and the difference
senate_avg_return = sum(senate_returns.values())/len(senate_returns)
sp_avg_return = sum(annual_sp.values())/len(annual_sp)
avg_difference = sum(senate_returns_over_annual_sp_list)/len(senate_returns_over_annual_sp_list)

print("Senate stock index average return by full calendar year: " + str((senate_avg_return-1)*100) + "%")
print("S&P 500 average return by full calendar year: "  + str((sp_avg_return-1)*100) + "%")
print("Average performance ratio of Senate to S&P 500 by full calendar year: " + str((avg_difference-1)*100) + "%")
   Year         Senate Return       S&P 500 Return Senate Under/Outperformance
0  2013    19.05808790795911%  29.601245275874867%         -8.135074123302699%
1  2014    6.366424649508939%  11.390638187366097%          -4.51044506038839%
2  2015   -5.010203861762042%  -0.726601583369757%         -4.314954808351457%
3  2016   17.383929532943387%  14.111673109218636%          2.867591311708151%
4  2017   27.875925283281823%  16.512905882711838%          9.752584329163172%
5  2018  -15.132846989715375%  -7.009396062778905%         -8.735776070903567%
6  2019    5.721781316130103%   28.71479623749518%         -17.86353674440041%
7  2020    9.390483956187223%  15.292907899381492%         -5.119503055942898%
Senate stock index average return by full calendar year: 8.206697724316658%
S&P 500 average return by full calendar year: 13.486021118237424%
Average performance ratio of Senate to S&P 500 by full calendar year: -4.507389277802254%

Senate Index: Equal-Weight Edition

What if the Senate was a hedge fund, and each senator had an equal say in what the hedge fund invested in? That is, if the hedge fund had 100milliontoinvest,theneachsenatorwoulddecidehowtoinvest1 million of it if all 100 senators participated (non-participants are simply excluded). Let's calculate an index that's equally weighed between the senators who were trading at any given time.

In [15]:
equally_weighed_senate_calculation = index_calculator(senators_equally_weighed=True)
equally_weighed_senate_calculation_worst_case = index_calculator(senators_equally_weighed=True, execution=trade_execution.WORST_CASE)
equally_weighed_senate_calculation_best_case = index_calculator(senators_equally_weighed=True, execution=trade_execution.BEST_CASE)
In [16]:
# Plot Adjusted S&P 500
date = sp_data['Date'].to_list()
value = sp_data['adj_value'].tolist()
plt.plot(date, value, color = "red", label = "S&P 500")

# Plot Adjusted NASDAQ
date = nasdaq_data['Date'].to_list()
value = nasdaq_data['adj_value'].tolist()
plt.plot(date, value, color = "green", label = "NASDAQ")

# Plot Senate Index
x= (trading_dates[equally_weighed_senate_calculation['days_until_first_trade']:len(equally_weighed_senate_calculation['index_by_day'])\
    +equally_weighed_senate_calculation['days_until_first_trade']])
y= equally_weighed_senate_calculation['index_by_day']
plt.plot(x, y, color="blue", label="Equally Weighed Senate Stock Index")
# Best-Case
x= (trading_dates[equally_weighed_senate_calculation_best_case['days_until_first_trade']:len(equally_weighed_senate_calculation_best_case['index_by_day'])\
    +equally_weighed_senate_calculation_best_case['days_until_first_trade']])
y= equally_weighed_senate_calculation_best_case['index_by_day']
plt.plot(x, y, color="lightblue", label="Equally Weighed Senate Stock Index Best-Case")
# Worst-Case
x= (trading_dates[equally_weighed_senate_calculation_worst_case['days_until_first_trade']:len(equally_weighed_senate_calculation_worst_case['index_by_day'])\
    +equally_weighed_senate_calculation_worst_case['days_until_first_trade']])
y= equally_weighed_senate_calculation_worst_case['index_by_day']
plt.plot(x, y, color="lightblue", label="Equally Weighed Senate Stock Index Worst-Case")

plt.xlabel("Year")
plt.ylabel("Value ($)")
plt.title("S&P Index (Adjusted) vs Senate Index (Equally Weighed)")
plt.legend()
plt.show()

The chart above is similar to the previous chart, but instead of calculating the holdings of all senators as one group, it weighs all senators with stock positions equally, i.e., if there are five senators with stocks, then each senator will be 20% of the index at that moment. The idea behind this index is that just because a senator has more money does not necessarily mean that they manage it better. For instance, some much older senators have had much more time to accumulate money. The equally-weighed index gives each senator an equal voice in the portfolio and allows us to see whether with the bias of wealth removed, how good senators are at picking stocks to buy and sell.

As you can see, when each senator gets an equal voice in determining which stocks to buy and sell, the Senate performs better as a whole. Although the Senate would still underperform the S&P 500 and Nasdaq Composite from 2020-2021, it would outperform the S&P 500 at least from 2014-2019.

Grouped Bar Chart

To analyze and compare the performance of the Senate's returns, we plotted a grouped bar chart of annual returns from the original Senate index and the S&P 500. Alongside the percent annual returns from 2013-2020 for the Senate and S&P 500, the green bar represents the performance of the Senate annual returns relative to the S&P 500.

As indicated by the grouped bar chart, the Senate's annual returns tended to underperform compared to the S&P 500, except in 2016 and 2017. For the rest of these years, the Senate had a negative return compared to if they had instead invested in the S&P 500.

In [17]:
labels = years

x = np.arange(len(labels))  # the label locations
width = 0.2  # the width of the bars

labels = [0]
for year in years:
  labels.append(year)

fig, ax = plt.subplots()
senate_returns_list = [(factor-1)*100 for factor in senate_returns_list]
annual_sp_list = [(factor-1)*100 for factor in annual_sp_list]
senate_returns_over_annual_sp_list = [(factor-1)*100 for factor in senate_returns_over_annual_sp_list]
rects1 = ax.bar(x - width, senate_returns_list, width, label='Senate Return')
rects2 = ax.bar(x, annual_sp_list, width, label='S&P 500 Return')
rects3 = ax.bar(x + width, senate_returns_over_annual_sp_list, width, label='Senate Under/Outperformance')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel("Percent Annual Return")
ax.set_xlabel("Year")
ax.set_title("Annual Return of S&P 500 and Senate")
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()

plt.show()

Pie Charts

To explore what senators commonly invested in, we constructed multiple pie charts to analyze the purchased stocks per sector per calendar year. Using our Senate data from SenateStockWatcher.com and ticker data from Compustat with GICS® sector values, we graphed each sector's purchase percentage for that year.

From our pie charts, you can see that Utilities and Real Estate weren't commonly purchased sectors while Energy, Materials, Industrials, Financials, Information Technology, and Health Care were consistently purchased more frequently. Interestingly, 2014, 2016, 2017, and 2018 showcased the most equally distributed purchases by sector.

In [18]:
# Pie charts of sector purchases each year

xlabels=["Energy",
         "Materials",
         "Industrials",
         "Consumer Discretionary",
         "Consumer Staples",
         "Health Care",
         "Financials",
         "Information Technology",
         "Communication Service",
         "Utilities",
         "Real Estate",
         "Other"]

# Collect sector data for each yera
for year in default_senate_calculation['purchases_by_year']:
  total = default_senate_calculation['purchases_by_year'][year]['total']
  keys = default_senate_calculation['purchases_by_year'][year]['by_sector'].keys()
  values = default_senate_calculation['purchases_by_year'][year]['by_sector'].values()
  y = []
  # Plot Pie graph
  for v in values:
    y.append((v/total) * 100)

  df = pd.DataFrame({'percent annual purchases': y}, index=xlabels)
  plot = df.plot.pie(y="percent annual purchases",
                     colors=["red", "green", "orange", "blue", "cyan", "pink", "purple", "gray", "brown", "black", "lime", "lightsteelblue"],
                     figsize=(15, 15))
  plt.title(str(year)+" Sector Purchases")

Line Graph

Another way we visualize the sector distribution is using a multi-line graph that also tracks each GICS® sector's purchase percentage over each calendar year.

From our line graph, we see that Materials, Financials, Information Technology, and other sectors outside the GICS® sectors increased in percent of annual purchases from 2019 to 2021. While it appears that senators mostly purchased stocks within the Materials and Financials sectors in 2012-2013, the data from that period is very limited and therefore shouldn't be strongly considered when analyzing the sector purchase distributions over time.

In [19]:
# Line graph of sector purchases each year

fig, ax = plt.subplots(figsize=(30,15))
xlabels=["Energy",
         "Materials",
         "Industrials",
         "Consumer Discretionary",
         "Consumer Staples",
         "Health Care",
         "Financials",
         "Information Technology",
         "Communication Service",
         "Utilities",
         "Real Estate",
         "Other"]
tics = ["10.0", "15.0", "20.0", "25.0", "30.0", "35.0", "40.0", "45.0", "50.0", "55.0", "60.0", "nan"]
years = list(default_senate_calculation['purchases_by_year'].keys())
sector_data = {}
for tic in tics:
  sector_data[tic] = []

# Calculate percent annual purchase for each sector per year
for year in default_senate_calculation['purchases_by_year']:
  for s in default_senate_calculation['purchases_by_year'][year]['by_sector'].keys():
    sector_data[s].append(default_senate_calculation['purchases_by_year'][year]['by_sector'][s]/(default_senate_calculation['purchases_by_year'][year]['total']))

# Plot line graph
for tic in tics:
  plt.plot(years, sector_data[tic], label = xlabels.pop(0))

plt.legend()
plt.title("Yearly Sector Purchases")
plt.xlabel("Year")
plt.ylabel("Percent Annual Purchase")
plt.show()
In [20]:
print(first_date)
print(last_date)
7706   2012-09-13
Name: transaction_date, dtype: datetime64[ns]
0   2021-11-29
Name: transaction_date, dtype: datetime64[ns]


Validating Our Results

Our results align with those in an April 2020 paper published under the National Bureau of Economic Research by a team from Dartmouth College. The research paper's team consisted of an economics professor and two students who studied senators' stock-trading performance between 2012—when the STOCK Act was passed—and March 2020, after the congressional insider trading scandal came to light. Highlights of their study include that senators' stock purchases underperformed at the six-month outset by -0.17% in the same industry and stock size while also underperforming at shorter intervals, but their stock purchases slightly outperformed by +0.14% at the one-year mark.

You can find the MarketWatch article on the study here.

And the link to the PDF of the study here.


Conclusion

How well do U.S. senators trade? Do they consistently outperform the S&P 500 index?

Based on our line graphs with our calculated Senate indices and Adjusted S&P 500 and NASDAQ market indices, we can see that senators are consistently underperforming compared to the market indices from 2012-2021. Therefore if someone were to compare investing 100intowhatsenatorsinvestedinsince2012,itwouldonlybeworthalittleabove200. Instead, if someone were to purchase a single ETF tracking the S&P 500 or NASDAQ, also at 100,itwouldbecurrentlyworthover300 or about $500, respectively.

Is the STOCK Act preventing U.S. senators from benefiting from insider trading?

While our analysis can't give a definitive answer on whether the STOCK Act prevented senators from benefiting from insider trading, our analysis does provide evidence that it did. When comparing our Senate index with the S&P 500 and NASDAQ, senators consistently underperformed the market indices. Given that senators were no longer able to beat the market as they had before the STOCK Act in the 1990s and the 2000s, this suggests that the STOCK Act may have been effective in preventing senators from benefiting from insider trading since the Senate underperformed after the STOCK Act.

Getting stock data without Compustat

Finally, while we used Compustat to access stock data, not everyone has access to it, so polygon.io and/or scraping data from TradingView can be used instead.